๐Ÿ•ท๏ธ๏ธ Job Radar โ€ข SCRAPING

Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.

freelancer.com ๐ŸŸก 2026-05-08

๐Ÿ”น Scrape book details from multiple sources for ten thousand ISBNs and push data into a Google Sheets workbook.
๐Ÿ‘ค Client: ๐Ÿ‡ฎ๐Ÿ‡ณ Pune, India Member since 2021-05-11
๐Ÿ’ฐ Price: $9 / hr Average bid
๐Ÿšฉ Problem: Automate the process of extracting book information from Amazon and external APIs while ensuring robustness against throttling, bot detection, and page format changes.
๐Ÿ“ฆ Existing: Not specified

Specifications:

[Target] Scrape book details for ten thousand ISBNs.
[Method] Use a combination of Scrapy and Playwright with rotating residential proxies to handle Amazonโ€™s throttling and bot checks. Integrate external APIs for additional data sources.
[UI/UX] Not applicable
[Stack] Python, Scrapy, Playwright, Google Sheets API, external APIs (e.g., Open Library)
[Security] Implement rate limiting, use secure proxy management, and ensure data privacy during transmission.
[Format] Cleaned and normalized JSON before pushing to Google Sheets.

Workflow:

1. Set up Scrapy project with custom middlewares for handling Amazonโ€™s throttling and bot checks.
2. Integrate Playwright for dynamic content scraping from Amazon and other external APIs.
3. Implement proxy management using a rotating residential proxy service to avoid detection.
4. Develop a robust data cleaning and normalization module to handle various formats of scraped data.
5. Create a Google Sheets connector that inserts or updates rows atomically, preserving existing formulas.
6. Document the codebase with PEP 8 compliance and provide detailed setup instructions for macOS and Ubuntu.

โšก Receive notifications instantly Join our community.